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1.  Introductory  section 

Batch  classification  is  used  in  selection  settings  where  the  data  from  a  number  of 
applicants  are  processed  in  order  to  decide  which  applicants  will  be  assigned  to  a 
number  of  different  vacant  jobs.  Batch  classification,  in  opposition  to  sequential 
systems,  processes  the  data  of  a  whole  group  of  applicants  simultaneously.  This  is 
appropriate  in  settings  where  the  enlistment  is  organized  in  groups,  such  as  annual 
recruitments.  Modern  batch  classification  systems  are  generally  composed  of  two 
major  elements. 

In  the  first  element  it  is  attempted  to  quantify  the  value  of  assigning  a  specific 
person  to  a  specific  job  or  a  certain  type  of  jobs.  In  the  military,  similar  jobs  are 
often  labeled  as  Military  Occupation  Specialties  (MOS)  or  as  trades.  The  quantified 
values  are  called  payoff-vdAxves  and  can  be  computed  in  several  ways.  Multiple 
linear  regressions  (MLR)  are  widely  used.  In  MLR  models,  the  payoffs  usually  are 
predicted  performance  scores  on  an  external  criterion  that  was  used  as  dependent 
variable  when  designing  the  MLR  model.  Another  method  to  produce  payoff-values 
is  the  Subject  Matter  Experts-method  (SME).  In  this  method,  subject  matter  experts 
are  asked  to  give  a  specific  weight  to  the  selection  variables  for  each  MOS  or  trade. 
The  payoffs  can  then  be  calculated  as  weighted  sums.  Artificial  Neural  Networks 
are  also  promising  tools  to  generate  payoff-values.  The  payoffs  are  computed  for  all 
person-job  combinations  and  usually  arranged  in  a  payoff-matrix  with  the 
applicants  as  rows  and  the  jobs  as  columns.  The  matrix  is  then  squared  by  adding 
dummy-jobs. 

When  the  payoff-matrix  is  ready,  the  second  major  element  of  the  classification 
model  is  used.  Since  the  matrix  was  squared  it  is  possible  to  link  each  applicant  to  a 
job  (a  real  one  or  a  dummy)  and  each  job  to  an  applicant.  That  can  be  done  by 
means  of  an  algorithm  that  maximizes  the  sum  of  the  payoff  values  identified  by 
linking  a  person  to  a  job.  This  classifies  the  applicants  and  also  identifies  the  ones 
who  are  selected  versus  the  ones  who  are  rejected. 

2.  How  to  assess  the  quality  of  a  batch  classification  model? 

Any  organization  considering  or  using  a  batch  classification  system  will 
undoubtedly  want  to  assess  its  quality.  But  how  should  we  express  this  quality?  To 
begin  with,  it  is  important  to  note  that  the  outcome  of  such  a  classification  model 
depends  on  quite  a  number  of  aspects.  Let  us  briefly  review  some  of  them. 

The  outcome  is  related  to  the  applicant  group.  The  selection  ratio  together  with  the 
level  and  distribution  of  relevant  aptitudes  and  characteristics  in  the  group  is 
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obviously  of  paramount  importance. 

The  outcome  is  also  related  to  the  vacant  jobs.  These  do  not  only  affect  the  selection 
ratio  but  also  have  a  certain  level  of  differentiation  as  to  their  attractiveness  and  the 
level  and  profile  of  aptitudes  and  characteristics  they  require.  In  general,  the  more 
differentiated  the  jobs  are,  the  more  powerful  the  effect  of  the  classification 
algorithm  will  be. 

The  outcome  is  also  highly  depending  on  the  payoff  computation.  The  quality  of  the 
payoffs  depends  on  things  such  as  the  measurement  quality  of  the  variables  used  in 
the  model  and  their  differential  validity,  the  judicious  setting  of  the  weights  and  the 
integration  of  metric  and  categorical  data  and  preferences. 

Finally,  the  classification  outcome  is  conditioned  by  the  chosen  objective  function 
and  the  used  algorithm. 

The  complexity  inherent  to  a  batch  classification  system  makes  it  rather 
inappropriate  to  summarize  its  quality  by  a  single  overall  number.  In  many  cases 
the  practitioner  will  be  better  off  with  a  series  of  indicators  each  focusing  on  a 
specific  aspect  of  the  classification  quality.  Such  indicators  are  indeed  available  and 
can  be  grouped  according  to  the  moment  at  which  they  can  be  obtained. 

Some  indicators  depend  on  data  that  are  not  available  at  the  time  the  classification 
algorithm  is  performed.  These  criterion  data  typically  comprise  attrition  rates  and 
performance  measurements.  Quality  indicators  based  on  such  data  include 
predictive  validity  coefficients  of  the  payoff-values,  differential  validity  of 
predictors,  logistic  regression  models  against  pass-fail  criteria,  cross  checks  of  the 
used  linear  models,  etc.  Such  quality  indicators  can  be  called  delayed  or  a  posteriori 
indicators. 

Other  quality  indicators  do  not  require  data  which  aren’t  available  immediately 
after  the  classification  algorithm  runs.  These  can  be  labeled  a  priori  or  immediate 
quality  indicators.  Given  the  title  of  this  paper,  we  will  concentrate  our  attention  on 
these.  These  indicators  are  less  powerful  than  the  ones  relying  on  criterion  data  and 
cannot  provide  the  practitioner  with  final  statements  concerning  the  quality  of  the 
used  system,  but  it  offers  one  tremendous  advantage:  it  allows  him  or  her  to  modify 
certain  parameters  used  in  the  classification  model  before  the  assignment  decisions 
are  carried  out.  Put  in  other  words,  these  indicators  allow  to  detect  problems  in  the 
classification  outcome  and  to  rectify  them  by  altering  the  parameters  of  the 
classification  system.  The  classification  model  can  subsequently  be  reran  until  the 
classification  quality  is  acceptable.  It  is  only  at  that  time  that  the  applicants  are 
informed  of  the  outcome. 

We’ll  now  review  some  immediate  quality  indicators.  To  illustrate  them,  we’ll  also 
present  some  screen  views  originating  from  the  Measures  of  Merit- module  of  the 
Psychometric  Model  which  is  the  batch  classification  model  currently  used  in  the 
Belgian  Armed  Forces.  The  examples  come  from  the  classification  for  the  annual 
Flemish  non-commissioned  officer  recruitment  in  July,  1998. 
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2.1.  The  fill  rate. 

The  first  indicator  is  the  fill  rate.  An  important  issue  is  whether  or  not  the  vacant 
jobs  will  be  filled.  If  the  classification  model  doesn’t  find  suitable  applicants  for  all 
jobs,  how  many  and  which  jobs  are  then  left  vacant?  Did  the  algorithm  have  a  lot  of 
choice  to  fill  a  certain  MOS?  Are  there  applicants  who  didn’t  get  a  job  but  remain 
available  in  the  event  that  another  candidate  resigns  for  the  job  he  or  she  got 
assigned  to?  These  questions  can  be  answered  easily  for  instance  by  a  table  like  the 
one  presented  in  following  figure. 


ijf.  Psychometric  Model:  Assignment  evaluation 
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The  first  three  columns  in  this  table  identify  the  jobs.  The  next  ones  give  the  number  of  vacant  jobs 
(NUM_JOBS)  and  the  number  of  persons  assigned  to  them  (NUM_Assign).  The  column  ‘Shortfall’  indicates  the 
number  of  positions  which  couldn’t  be  filled.  The  last  two  columns  give  the  number  of  applicants  that  was 
eligible  for  the  job  (that  is,  who  met  all  criteria  and  therefore  got  a  payoff-value  for  that  job)  and  the  number  of 
still  ‘Available’  applicants  after  the  assignment.  Those  are  the  ones  that  have  an  acceptable  payoff  but  weren’t 
selected  in  the  first  place.  If  the  user  wants  to  remedy  a  shortfall,  he  or  she  can  lower  some  thresholds  that  reject 
a  large  number  of  applicants  for  that  trade  or  artificially  increase  the  payoffs  for  the  trade  so  that  the  algorithm 
will  direct  the  applicants  preferentially  to  it.  A  large  number  of  ‘available’  persons  on  the  other  hand,  offers  the 
possibility  to  increase  certain  minimum  thresholds  when  that  is  believed  to  be  desirable.  One  should  note 
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however  that  usually  there  is  a  lot  of  overlap  in  the  groups  of  ‘available’  persons  for  different  trades. 

2.2.  The  Mean  Predicted  Performance. 

The  second  quality  indicator  is  the  Mean  Predicted  Performance  (MPP).  Given  that  the  payoff-values  are 
computed  using  a  model  based  on  the  relationship  between  predictors  and  performance  (such  as  the  multiple 
linear  regression  model),  it  becomes  possible  to  estimate  the  later  performance  of  an  individual  in  a  specific 
trade.  After  the  classification  model  ran,  one  can  compute  the  MPP  for  each  trade  and  compare  those  with  known 
average  performance  in  the  same  trades.  This  quality  indicator  requires  stable  prediction  models  and  those  are 
not  always  available.  Its  diagnostical  power  tends  to  be  low  as  well. 

2.3.  Descriptive  statistics  for  the  groups  assigned  to  trades. 

Another  approach  of  the  classification  quality  is  based  on  the  descriptive  statistics  of  the  groups  of  applicants 
that  are  assigned  to  the  different  trades.  Aptitudes  and  other  characteristics  measured  at  the  interval  or  ratio  level 
can  be  summarized  by  their  average  whereas  categorical  data  can  be  shown  in  contingency  tables. 


Psychometric  Model:  Evaluation  of  means 
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The  three  columns  on  the  left  side  present  the  name  of  the  variable  and  its  theoretical  minimum  and  maximum 
values.  The  next  three  columns  show  the  averages  for  the  variable  in  the  row  for  all  applicants  in  the  model 
(ALL),  all  assigned  applicants  (ALL_ASS)  and  all  applicants  that  were  not  assigned  to  a  job  (ALL_NOT).  The 
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remaining  columns  show  the  average  of  the  row-variable  for  the  applicants  assigned  to  the  jobs  identified  by  the 
column-header.  When  examining  the  variable  ST_PINP  for  instance  (standardized  intelligence  measurement), 
one  can  see  that  the  group  of  assigned  persons  has  an  average  of  68.4  whereas  the  not-assigned  group  has  only 
44.7.  The  persons  assigned  to  the  job  ‘2’  even  have  an  average  of  75.9. 

This  table  is  very  useful  to  compare  the  assigned  group  versus  the  not-assigned  group  to  see  the  selection-effect 
of  the  classification  model  on  each  variable.  This  table  also  contains  the  necessary  data  to  compare  the  averaged 
aptitude  profiles  for  different  groups.  Such  a  table  however  is  not  very  user-friendly  for  that  purpose.  That  is  the 
reason  why  another  -  graphical  -  instrument  was  developed.  Next  figure  presents  it. 


This  screen  allows  to  generate  graphs  very  easily.  The  user  can  choose  any  metric  variables  he  or  she  wants  and 
then  select  certain  profiles.  These  profiles  can  include  any  individual  applicant,  groups  assigned  to  a  specified 
Job-ID  or  MOS  and  the  three  reference  groups:  all  assigned,  all  not-assigned  and  all  applicants  in  the  model. 

In  this  example,  some  average  aptitudes  are  compared  for  the  groups  assigned  to  the  MOS  Air  Traffic  Control 
(Profile  1 ,  MOS  240)  and  Airfield  Defense  (Profile  2,  MOS  250).  On  average,  the  Air  Traffic  Control  group 
performs  better  in  General  Intelligence  (ST  PINP)  and  Technical  English  (ST_ENG_T)  and  lower  on  Physical 
Fitness  (ST_PHYS).  The  personality  score  (ST_KAHO)  of  both  groups  is  similar.  Since  this  is  in  accordance 
with  what  was  desired,  no  corrective  action  is  required. 
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Both  previous  screens  focused  on  metric  data.  For  categorical  data,  one  can  check  the  frequencies  of  the  different 
variable-classes  for  several  relevant  groups. 


i*  Psychometric  Model:  Evaluation  of  frequencies 


The  left  column  in  this  table  exhibits  the  categorical  variable  name  and  the  second  column  shows  the  different 
categories  or  classes  of  that  variable.  The  remaining  columns  contain  the  observed  frequencies  of  the 
variable-class  in  the  row  for  different  groups:  the  three  reference  groups  and  the  groups  assigned  to  the  jobs  in 
the  column  header. 

When  looking  at  the  variable  ‘FAC_P’  for  instance,  which  describes  the  general  medical  fitness  with  three 
classes  (1-2-3)  that  do  not  exclude  the  candidate,  we  notice  that  no  applicant  got  a  FAC_P  of  1,  305  applicants 
got  a  2  and  61  of  them  got  a  3.  When  we  look  further  and  use  some  elementary  statistics  we  can  say  that  the  odds 
to  be  assigned  rather  than  not-assigned  are  at  least  2.5  times  higher  for  the  FAC_P  2  candidates  than  for  the 
FAC_P  3  applicants  (lower  bound  of  95%  exact  confidence  interval).  This  can  be  related  to  the  used  coefficients 
for  the  classes  of  the  variable  to  check  whether  the  outcome  is  desirable. 

2.4.  Respect  of  the  applicant’s  preferences. 

A  modem  classification  system  shouldn’t  be  based  on  aptitudes  only  but  needs  to  include  the  expressed 
preferences  of  the  applicants  as  well.  When  this  is  the  case,  it  will  be  of  interest  to  see  to  what  extend  the 
classification  model  respected  the  preferences  of  the  applicants.  In  the  Psychometric  Model,  the  applicants  are 
requested  to  express  their  preferences  towards  each  trade  on  a  1  to  99  scale.  As  a  quality  indicator  for  the 
classification  model,  we’ll  compute  the  average  preference  for  a  specific  trade  from  the  group  of  applicants  that 
is  assigned  to  that  same  specific  trade. 
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*  Psychometric  Model:  Evaluation  of  MOS  preferences 
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In  this  table,  the  cell  values  represent  the  average  preference  of  the  group  indicated  in  the  column  header,  for  the 
MOS  in  the  left  column.  The  column  ‘ALL’  indicates  the  popularity  of  a  MOS.  The  most  relevant  cells  are 
highlighted.  They  represent  the  preference  for  a  MOS  as  expressed  by  the  group  assigned  to  that  MOS.  Low 
values  indicate  to  a  certain  extend  that  the  applicants  assigned  to  that  trade  didn’t  really  want  this  trade.  Very 
high  values  could  result  from  giving  too  much  weight  to  the  choices  of  the  applicants,  perhaps  at  the  expense  of 
not  taking  their  aptitudes  enough  into  consideration.  Problems  discovered  through  this  table  can  be  corrected  by 
adapting  the  weight  given  to  the  preferences  of  the  applicants. 

2.5.  Respect  of  set  profiles. 

The  following  quality  indicator  attempts  to  check  whether  the  profiles  defined  by  the  weights  used  to  compute 
the  payoffs  for  a  trade,  correspond  to  the  aptitude  profiles  of  the  applicants  assigned  to  that  trade.  To  do  so,  one 
needs  to  consider  the  variables  used  to  calculate  the  payoffs  for  a  specific  trade.  If  you  standardize  these  over  all 
the  acceptable  applicants  to  a  common  mean  and  variance,  and  then  take  the  average  on  these  standardized 
variables  for  the  group  of  applicants  assigned  to  that  trade,  one  can  see  the  departure  from  the  overall  mean  as  an 
indicator  of  the  weight  actually  given  to  the  variable  in  the  model.  It  is  further  possible  to  express  these 
trade-averages  and  the  weights  used  to  compute  the  payoffs  on  the  same  scale  and  to  compare  them  pairwise. 
This  can  be  done  graphically  or  by  means  of  correlations. 


2.6.  Specificity  of  set  profiles. 

The  last  proposed  quality  indicator  consists  of  the  correlation  matrix  of  the  payoffs.  Highly  positively  correlated 
payoffs  indicate  a  possible  lack  of  differentiation  between  the  requested  aptitude  profiles.  If  the  concerned  trades 
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are  not  considered  to  be  very  similar,  one  should  try  to  identify  means  to  discriminate  between  them  and  to 
incorporate  these  in  the  classification  model. 

3.  Future  directions 

When  using  the  immediate  quality  indicators  as  described,  a  practitioner  can  get  a  very  accurate  idea  of  the 
quality  of  the  used  batch  classification  system.  Such  a  quality  assessment  however,  still  requires  a  good  amount 
of  expertise.  Therefore  it  is  recommended  to  develop  expert  systems  detecting  problems  and  suggesting  ways  to 
correct  them  to  assist  the  user  of  such  classification  systems. 
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